{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Export data processing pipeline to torchscript for further deployment without Python environment\n",
    "\n",
    "## Introduction\n",
    "\n",
    "When deploying models, we are more likely to use other languages (e.g. C++) instead of Python. This post provides a way by which users could save Chronos data processing pipeline as torchscript files to make it available without Python environment.\n",
    "\n",
    "In this guide, we will\n",
    "\n",
    "1. Develop a TCNForecaster with nyc_taxi dataset.\n",
    "2. Export the data processing pipeline to torchscript after the forecaster is developed.\n",
    "3. Show users how to use the saved pipeline when deploying the forecaster.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 📝**Note**\n",
    "> \n",
    "> - Except exporting data processing pipeline to torchscript, you could also export the whole forecasting pipeline to torchscript. There is another [how-to guide](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Howto/how_to_export_torchscript_files.html) that illustrates this in detail, you may refer to it for more information."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Forecaster developing\n",
    "\n",
    "First let's prepare the data. We will manually download the data to show the details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# run following\n",
    "!wget https://raw.githubusercontent.com/numenta/NAB/v1.0/data/realKnownCause/nyc_taxi.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we create a `TSDataset` instance based on the data and do preprocessing. You could refer to\n",
    "\n",
    "- [How to preprocess my own data](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Howto/how_to_preprocess_my_data.html)\n",
    "\n",
    "for details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import StandardScaler\n",
    "from bigdl.chronos.data import TSDataset\n",
    "import pandas as pd\n",
    "\n",
    "# load the data to pandas dataframe\n",
    "df = pd.read_csv(\"nyc_taxi.csv\", parse_dates=[\"timestamp\"])\n",
    "\n",
    "# create TSDataset instance\n",
    "train_data, _, test_data = TSDataset.from_pandas(df,\n",
    "                                                 dt_col=\"timestamp\",\n",
    "                                                 target_col=\"value\",\n",
    "                                                 repair=False,\n",
    "                                                 with_split=True,\n",
    "                                                 test_ratio=0.1)\n",
    "\n",
    "# create a scaler for data scaling\n",
    "scaler = StandardScaler()\n",
    "\n",
    "# preprocess train_data (scale and roll sampling)\n",
    "train_data.scale(scaler, fit=True) \\\n",
    "          .roll(lookback=48, horizon=24)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the prepared data, we could easily develop a forecaster. You may refer to other how-to guides for more detail.\n",
    "\n",
    "- [How to create a Forecaster](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Howto/how_to_create_forecaster.html)\n",
    "- [Train forcaster on single node](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Howto/how_to_train_forecaster_on_one_node.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bigdl.chronos.forecaster import TCNForecaster\n",
    "\n",
    "# create a forecaster from tsdataset\n",
    "forecaster = TCNForecaster.from_tsdataset(train_data)\n",
    "\n",
    "# train the forecaster\n",
    "forecaster.fit(train_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When a forecaster is developed with satisfying accuracy and performance, you could export the data processing pipeline to torchscript for further deployment. Currently preprocessing (including `scale` and `roll`) and postprocessing (including `unscale_numpy`) can be exported by calling `tsdataset.export_jit(path_dir=None, drop_dt_col=True)`, please check [API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.export_jit) for more detailed information and current limitations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "# create a directory to save the pipeline\n",
    "saved_dir = Path(\"jit_module\")\n",
    "saved_dir.mkdir(exist_ok=True)\n",
    "\n",
    "# export data processing pipeline to torchscript\n",
    "train_data.export_jit(path_dir=saved_dir, drop_dt_col=True)\n",
    "\n",
    "# save the test_data to csv files for deployment\n",
    "# make sure to set `index=False` to make the saved data have the same structure as original data\n",
    "test_data.df.to_csv(\"deployment_data.csv\", index=False)\n",
    "\n",
    "# export the forecaster to torchscript files for deployment\n",
    "forecaster_path = saved_dir / \"forecaster\"\n",
    "forecaster_path.mkdir(exist_ok=True)\n",
    "forecaster.export_torchscript_file(dirname=forecaster_path, quantized_dirname=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now the preprocessing pipeline and postprocessing pipeline are saved in `jit_module/tsdata_preprocessing.pt` and `jit_module/tsdata_postprocessing.pt`, you could load them when deploying the forecaster."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Forecaster deployment\n",
    "\n",
    "With the saved \".pt\" files, you could do data preprocessing and postprocessing without Python environment. We provide a deployment workflow example in Python here since Python code can be directly executed in jupyter notebook, besides, a code snip representing the core deployment workflow in C++ using libtorch API is also presented below. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# deployment example in Python\n",
    "\n",
    "import torch\n",
    "\n",
    "# load the data from csv file\n",
    "deployment_df = pd.read_csv(\"deployment_data.csv\", parse_dates=[\"timestamp\"])\n",
    "\n",
    "# drop the datetime column because we specified `drop_dt_col=True` when exporting the pipeline\n",
    "# now the data structure is same as data used in developing\n",
    "deployment_df.drop(columns=\"timestamp\", inplace=True)\n",
    "\n",
    "# create input tensor\n",
    "input_tensor = torch.from_numpy(deployment_df.values).type(torch.float64)\n",
    "\n",
    "# load the saved pipelines\n",
    "preprocess_path = saved_dir / \"tsdata_preprocessing.pt\"\n",
    "postprocess_path = saved_dir / \"tsdata_postprocessing.pt\"\n",
    "preprocess_module = torch.jit.load(preprocess_path)\n",
    "postprocess_module = torch.jit.load(postprocess_path)\n",
    "\n",
    "# preprocessing\n",
    "preprocess_output = preprocess_module.forward(input_tensor)\n",
    "\n",
    "# load the forecaster and inference\n",
    "forecaster_module_path = forecaster_path / \"ckpt.pth\"\n",
    "forecaster_module = torch.jit.load(forecaster_module_path)\n",
    "inference_output = forecaster_module.forward(preprocess_output)\n",
    "\n",
    "# postprocessing\n",
    "postprocess_output = postprocess_module.forward(inference_output)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compare the result with the output of original deployment pipeline using Chronos API\n",
    "\n",
    "from numpy.testing import assert_array_almost_equal\n",
    "\n",
    "# preprocessing\n",
    "test_data.scale(scaler, fit=False)\\\n",
    "         .roll(lookback=48, horizon=24, is_predict=True)\n",
    "input_data = test_data.to_numpy()\n",
    "\n",
    "# inference\n",
    "forecaster_module = torch.jit.load(forecaster_module_path)\n",
    "inference_output_original = forecaster_module.forward(torch.from_numpy(input_data))\n",
    "\n",
    "# postprocessing\n",
    "postprocess_output_original = test_data.unscale_numpy(inference_output_original)\n",
    "\n",
    "# compare the results\n",
    "assert_array_almost_equal(postprocess_output.numpy(), postprocess_output_original)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Deployment in C++\n",
    "\n",
    "The following code describes the core deployment workflow in C++ using libtorch APIs. You could refer to [installation guide](https://pytorch.org/cppdocs/installing.html) to install libtorch, and more information of APIs is available at [libtorch API doc](https://pytorch.org/cppdocs/api/library_root.html).\n",
    "\n",
    "```C++\n",
    "// core deployment workflow example in C++\n",
    "\n",
    "#include <torch/torch.h>\n",
    "#include <torch/script.h>\n",
    "\n",
    "// Create input tensor from your data, you should implement this function.\n",
    "// The data to create input tensor should have the same format as the data used in developing.\n",
    "// If you sepcified drop_dt_col=True when exporting the pipelines, you should skip the\n",
    "// datatime column here to keep the same structure as the developing data.\n",
    "torch::Tensor input_tensor = create_input_tensor(data);\n",
    "\n",
    "// load the preprocessing pipeline\n",
    "torch::jit::script::Module preprocessing;\n",
    "preprocessing = torch::jit::load(preprocessing_path);\n",
    "\n",
    "// run data preprocessing\n",
    "torch::Tensor preprocessing_output = preprocessing.forward(input_tensor).toTensor();\n",
    "\n",
    "// inference using your trained model, replace \"trained_model\" with your model\n",
    "torch::Tensor inference_output = trained_model(preprocessing_output)\n",
    "\n",
    "// load the postprocessing pipeline\n",
    "torch::jit::script::Module postprocessing;\n",
    "postprocessing = torch::jit::load(postprocessing_path);\n",
    "\n",
    "// run postprocessing\n",
    "torch::Tensor output = postprocessing.forward(inference_output).toTensor()\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.15 ('chronos-dev')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.15"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "edcf3a1384e0c635c6a1374bbfb4bfc5de419d28ba5059a4c674cfed4e784c9a"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
